-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Observability stack #11
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: omrishiv <[email protected]>
Signed-off-by: omrishiv <[email protected]>
The observability stack is built upon: | ||
- Prometheus - metrics | ||
- Loki - logging | ||
- Promtail - log delivery |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interested in why promtail is used and something like fluentbit or opentelemetry collector isn't being used
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Promtail is really easy to use if only Loki is being used. That being said, I may look at switching to fluent bit as I've used it in another project recently.
Is there a compelling reason to move from promtail to either? Grafana Agent could be another option as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fluentbit is more popular among EKS end users also supported when using fargate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
opentelemety for logs is very new and not a lot end users have adopted
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be a pretty simple swap for fluebtbit. Let me take a look at.
@@ -0,0 +1,50 @@ | |||
apiVersion: argoproj.io/v1alpha1 | |||
kind: Application |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we could move these to applicationsets to make it easier for folks to move to adopt in production easier now that idpbuilder supports appsets?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is definitely something Manabu and I spoke about. I was waiting for the example to see how to conform this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if it was already discussed, But I would question Loki too? Loki is GPL I would preferred OpenSearch is Apache 2
That's a typical stack that I see in fully open source for logs+traces fluentbit+opensearch
For metrics I see opentelemetry-collector-daemonset+prometheus
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@csantanapr thanks for raising this.
I am not a lawyer, so please correct me if I'm wrong: my understanding with AGPL is that we can't modify the source of the application without copy left. I believe many have built observability stacks on top of the Grafana stack, which is AGPL for the core components (Loki, Grafana, Tempo, Mimir). As we are not modifying the source, we should be ok to use it. Again, please please please correct me if I'm wrong.
This is a valid concern and I believe the flexibility of working in stacks allows us to create another implementation that relies on other tooling. What I do believe is that we need to come to an agreement on what our opinionated stack is. If this is not it, I'm ok with that, but let's discuss this during the next community meeting so we can figure out how we want to proceed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we discussed this and yes since Grafana is already AGPL, choosing Loki as another AGPL project is less of a concern. That said, I agree with the discussion above that we should also think about using OpenSearch given its popularity. Publishing it as an alternative observability stack sounds good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will say we do use the Otel Collector as a daemonset for logs and prometheus metrics and eventually for traces.
I do understand that the community is much more invested in fluentbit for logging so I think a standard of OpenSearch + FluentBit with OpenTelemetry Collector Daemonset for Prometheus Metrics/Otel Traces seems to be a good pattern for me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If It makes sense, I suggest:
- Swapping promtail for FluentBit and keeping the rest of this stack the same. This gives us a Grafana based stack to work with (though, there's an argument to be made to swap promtail for Grafana Agent so we are closer to the LGTM implementation)
- Creating another stack based on Opensearch + Fluenbit, OTEL/prom as an alternative. This gives us the opportunity for testing how we have substitutable stacks.
We can have both live under /observability
and have /observability/grafana-stack
and /observability/otel-stack
I'm ok working on all of this and would be happy to put the otel stack together as well. Does that make sense to do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@omrishiv
I agree with having separate stacks like /observability/grafana-stack
and /observability/otel-stack
.
@csantanapr @blakeromano What are your thoughts on this?
Adding the initial implementation of the observability stack. This includes:
It uses the
ref-implementation
for SSO, with user1 being able to log in to Grafana as an admin. New users can be created and roles assigned for various purposes.Addresses #10